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ON ADAPTIVE WEIGHTED POLYNOMIAL PRECONDITIONING 
FOR HERMITIAN POSITIVE DEFINITE MATRICES 

BERND FISCHER* AND ROLAND W. FREUND* 

Abstract. The conjugate gradient algorithm for solving Hermitian positive definite linear systems is 
usually combined with preconditioning in order to speed up convergence. In recent years, there has been 
a revival of polynomial preconditioning, motivated by the attractive features of the method on modern 
architectures. Standard techniques for choosing the preconditioning polynomial are based only on bounds 
for the extreme eigenvalues. Here a different approach is proposed, which aims at adapting the preconditioner 
to the eigenvalue distribution of the coefficient matrix. The technique is based on the observation that good 
estimates for the eigenvalue distribution can be derived after only a few steps of the Lanczos process. is 
information is then used to construct a weight function for a suitable Chebyshev approximation problem. 
The solution of this problem yields the polynomial preconditioner. In particular, we investigate the use of 
Bernstein-Szego weights. 

Key words, linear systems, Hermitian positive definite matrices, conjugate gradient algorithm, polyno- 
mial preconditioning, Chebyshev approximation problem, Bernstein-Szego weights 
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1. Introduction. One of the most powerful iterative schemes for solving Hermitian 
positive definite linear systems 

(1.1) Ax = b 

is the conjugate gradient algorithm (CG) of Hestenes and Stiefel [13], especially when it 
is combined with preconditioning [4]. In recent years, there has been much interest in 
polynomial preconditioning. The basic idea is as follows: instead of solving the original 
system (1.1) by CG, the CG iteration is applied either to 

(1.2) V>( A)Ax = ip{A)b 
(left preconditioning), or to 

(1.3) Aip{A)y = 6, x = ip{A)y 

(right preconditioning). Here ip is a suitably chosen polynomial of small degree. Moreover, 
it is required that none of the zeros of ip coincides with an eigenvalue of A. This guarantees 
that the preconditioned systems (1.2) and (1.3) are both equivalent to (1-1). 

Polynomial preconditioning goes back to the 1950s. It seems that Lanczos [17] was the 
first to mention the idea; interestingly, his paper is never referenced. Stiefel [23] used poly- 
nomial preconditioning techniques to accelerate eigenvalue computations. Rutishauser [20] 
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proposed an inner-outer iteration process, with CG as the outer iteration and the Cheby- 
shev semi-iterative method [12] as the inner recursion. The motivation for his approach was 
to reduce roundoff in the classical CG algorithm. In the 1980s, starting with the work of 
Johnson, Micchelli, and Paul [14], there has been a revival of Rutishauser’s method and 
polynomial preconditioning in general, see [21, 7, 19, 2] and the references given there. The 
main reason for this renewed interest is that polynomial preconditioning is an attractive 
technique on vector and parallel computers (see, e.g., [22]). Each CG iteration involves the 
computation of inner products, which constitutes a bottleneck on many modern architec- 
tures. Typically, polynomial preconditioning reduces the total number of inner products, 
since the multiplication by the preconditioning matrix x^(A) does not require inner products. 

For the Chebyshev iteration in Rutishauser’s method, estimates a and b for the smallest 
and largest eigenvalues of A are needed. Often, good upper bounds b can be obtained easily, 
using simple techniques such as Gershgorin’s theorem [21]. It is far more difficult to estimate 
the smallest eigenvalue. Saad [21] has proposed a polynomial preconditioning technique that 
only requires an upper bound for the largest eigenvalue, while the trivial bound a 0 is 
used for the smallest eigenvalue of the positive definite matrix A. His technique is based 
on least squares polynomials associated with the family of Jacobi weights [24]. Ideally, one 
would choose the weight function such that CG for the preconditioned systems (1-2) and 
(1.3) converges as fast as possible. However, this problem is not addressed in [21]. 

Another option is to construct polynomial preconditioners via weighted Chebyshev ap- 
proximation problems. This was proposed by Freund [7] who also suggested a heuristic for 
adapting the weight function to the eigenvalue distribution of A. The technique exploits the 
observation that eigenvalue distributions of Hermitian matrices can be surprisingly well esti- 
mated, using only a few steps of the Lanczos method. Actually, spectral estimation based on 
the Lanczos process is a widely used technique in applications (see, e.g., [25]). The obtained 
estimated eigenvalue distribution is then used to construct a weight function, and finally the 
polynomial preconditioner is computed by solving the corresponding approximation problem. 

In this paper, we further study polynomial preconditioning based on weighted Cheby- 
shev approximation problems. In particular, we investigate the use of Bernstein- Szego 
weights [24]. For such weights, the solutions of the associated approximation problems 
are known explicitly. Therefore, the construction of preconditioning polynomials based on 
Bernstein- Szego weights does not involve the numerical solution of an approximation prob- 
lem. 

The remainder of this paper is organized as follows. In § 2, we recall some basic prop- 
erties of CG, and we discuss Chebyshev polynomial preconditioning. In § 3, we present our 
approach to polynomial preconditioning based on weighted Chebyshev approximation prob- 
lems, and we propose a procedure for obtaining a suitable weight function from the Lanczos 
process. This technique involves the construction of a monotone interpolant. In § 4, we 
briefly describe a procedure for monotone piecewise cubic interpolation. In § 5, we consider 
polynomial preconditioners based on Bernstein-Szego weights. Finally, in § 6, we make some 
concluding remarks. 

Throughout the paper, it is assumed that A in (1.1) is a Hermitian positive definite 
TV x TV matrix, with real or complex entries. As usual, denotes the conjugate transpose 
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of a matrix Af. The vector norm ||x|| = y/x"x is always the Euclidean norm. Finally, we 
denote by 

“P n := {^(A) = <7'o + C 7 'iA + ,, " + | <7 0 , <X\1 • • • i a n € C} 

the set of all complex polynomials of degree at most n. 

2. CG and Chebyshev polynomial preconditioning. In this section, we collect 
some basic facts about CG, and we review Chebyshev polynomial preconditioning. 

2.1. The CG algorithm. Let x 0 G C N be any initial guess for the solution of (1.1), 
and let r 0 := 6— Ax 0 be the associated residual vector. The CG algorithm generates iterates 
of the form 

(2.1) x n = *0 + where x„ e ^n-D n = l,2,.... 

The corresponding residual vectors are given by 

(2.2) r n = <f> n (A)r 0 , where <£ n (A) = 1 - Ax„(A). 

In exact arithmetic, the CG algorithm terminates after a finite number of steps with the 
exact solution x ^ — A~^b of (1.1). In the sequel, L always denotes this termination index. 
We remark that L is just the minimal number of components in any expansion of r 0 into 
orthonormal eigenvectors Vj of A, and thus we have 

L 

(2.3) r 0 - where for all j. 

i - 1 

In particular, L < N. In the following, we always assume that the vectors v- have been scaled 
such that ctj > 0 in (2.3). Furthermore, we denote by A_, the eigenvalues corresponding to 
Vj, i.e., 

(2.4) Avj = A jVj. 

Clearly, the Aj’s are distinct, and from now on, we assume that they are numbered in 
increasing order: 

Aj < Aj < • • • < A l . 

The CG iterates are optimal, in the sense that r^A~ l r n is minimal for all possible 
iterates of the form (2.1). This minimization property can be shown to be equivalent to the 
following orthogonality relations: 

(2.5) r n r k = 0 for all n, k = 0, 1, . . . , L, n f k. 

Using (2.2), (2.3), and (2.4), we can rewrite (2.5) in the form 

(2.6) (^ n , = 0 for all n, k = 0, 1, . . . , L , n ^ k, 
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where 

(<t>n,<l>k) ’■= r 0 <f> n (A)M A ) r O 

j) =: 

1=1 Jn 

The distribution function a(A) in (2.7) is defined by 

(2.8) o(A) = - A,), where H(\) = { J 

J=1 k 

We now turn to polynomial preconditioning. For simplicity, we will focus only on right 
preconditioning (1.3), but essentially the same statements remain true for left preconditioning 
(1.2). From now on, it is always assumed that p > 1 is a given integer, and we consider 
preconditioning polynomials xp (E in (1.3). 

2.2. Chebyshev polynomial preconditioning. The standard approach for the de- 
sign of preconditioners is to choose the polynomial xp in (1.3) such that Axp(A) is in some 
sense as close as possible to the identity matrix I. For instance, one could attempt to 
minimize the Euclidean norm ||7 — |] . However, the solution of this problem would 

require the knowledge of all eigenvalues of A. Therefore, one usually substitutes for the 
spectrum of A an interval [a, 6], where 0 < a < Aj and b > A^. This leads to the Chebyshev 
approximation problem 


(2.9) 


min max 

<p€V p : v(Q)=l A6[a,6] 


kp(A)|, 


where </?(A) = 1 — AV»(A). It is well known that the optimal solution <p p of (2.9) is just a 
suitably shifted and normalized Chebyshev polynomial of the first kind, and we have 

(2.10) <p p (\) = Tp f^\ where 7(A) = J"" and t = *(°)- 

Note that the spectrum of the preconditioned matrix Axp p (A) is contained in the interval 

(2.11) [a„,6 p ], where a p := 1 - and b p := 1 + 

In view of the optimality properties of the CG algorithm, n steps of CG applied to 
the preconditioned system (1.3) can at best give the same residual vector as np steps of 
CG applied to the original system (1.1). As in (2.2), let r np = <p np (A)r 0 denote the residual 
vector obtained after np steps of CG applied to (1.1). Similarly, let r£ p ) = <p^(Aip p (A))r 0 be 
the nth residual vector generated by CG applied to (1.3). Obviously, Chebyshev polynomial 
preconditioning is indeed best possible, provided that the residual polynomials <p np and <p^ 
satisfy 


( 2 . 12 ) 


KW = 4 P ’(AiO p (A)). 
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Next, we show that (2.12) is fulfilled if the distribution function cr(A) in (2.8) corresponds 
to the worst-case distribution. 

The function <r(A) is a step function with jumps at the eigenvalues of A. For our 
discussion, we treat cr(A) as a continuous function, and we denote by 


^(A) = 


dcr(A) 
d A 


the corresponding density function. Actually, since the dimension of A (and thus L) is 
usually large, the step function “looks like” a continuous function. From potential theory, it 
is known that the worst-case distribution for an interval is the equilibrium distribution, and 
for the case of the unit interval, this is given by 


(2.13) c E (0 = arcsin(t), t € [-1, 1]. 

Furthermore, the orthogonal polynomials with respect to the corresponding inner product 
are the Chebyshev polynomials of the first kind, and we have, for all integers n, k > 0, n ^ k, 

(2.14) T n (t)T k (t)6 E {t)dt = 0, where S E {t) = -^==. 


We remark that (2.14) can be evaluated by means of Gaussian quadrature. This gives 


(2.15) 


b-a r' 
2 7-i 


mem 


L 


s E (t)dt = 

1 


T„(<(A,))r t (f(A,)) 

T n (()T>(0 ' 


n,k < L. 


In other words, for the distribution function defined by (2.15), the CG residuals correspond 
to Chebyshev polynomials. We remark that in this case the standard error bounds (see, 
e.g., [11]) for the CG iterates are sharp. 

Next, we show that for the worst-case distribution the relation (2.12) is indeed satisfied, 
and hence Chebyshev polynomial preconditioning is optimal in this case. 

LEMMA 2.1. Let £ p denote the linear mapping that maps [n p , ^ p ] ( c /- (2.11)) onto 

the unit interval [—1,1] and let £ p = ^ p (0). Let <j> np { A) = T np (£(\))/T np (£) and <f>^ ^(A) _ 
T n {£ (A))/T n (£ p ) be the shifted and normalized Chebyshev polynomial on [a, 6] and [a p .b p ], 
respectively. Then the identity (2.12) is satisfied. 

Proof. From (2.11) and (2.10) one readily obtains 


^' P| (^ P (A)) = 


r,(r,(<(A)» 

T n (T p {0) ’ 


Equation (2.12) then follows from the well-known identity T np (t) = T n (T p (t)). □ 

3. Weighted polynomial preconditioning. As discussed in the previous section, 
Chebyshev polynomial preconditioning is optimal for matrices A, for which the function 
cr(A) in (2.8) is the worst-case distribution. However, in practice, linear systems, especially 
those arising in the numerical treatment of partial differential equations, have eigenvalue dis- 
tributions that are far from the worst case. For those, Chebyshev polynomial preconditioning 
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is not optimal. Furthermore, it is known [1] that repeated applications of Chebyshev poly- 
nomials will transform any given distribution into the worst-case distribution. This behavior 
is also reflected in Chebyshev polynomial preconditioning. If A has a favorable eigenvalue 
distribution then the eigenvalue distribution of the preconditioned system is usually much 
closer to the worst case. 

In this section, we propose a heuristic for adapting the preconditioning polynomial to 
the actual eigenvalue distribution of A. 


3.1. Weighted Chebyshev approximation problems. Instead of (2.9), we now 
consider weighted Chebyshev approximation problems of the form 


(3.1) 


min max 
(fi€V p : v?(0)=l A€[a,6] 


|u;(A)y>(A)|. 


Here in is a continuous weight function on [a, 6], and it is always assumed that tn(A) > 0 
on the open interval (a, 6). Standard results from approximation theory (see, e.g., [18]) 
guarantee that there exists a unique optimal polynomial <p p for (3.1). In general, (p p is not 
known explicitly, and one needs to solve (3.1) numerically, for example, using the Remez 
algorithm (see, e.g., [18, 10]). 

If tD(A) = 1, then the shifted and scaled Chebyshev polynomial (2.10) is the solution of 

(3.1). The idea now is to use the optimal solution (p p as a polynomial preconditioner, where 
the weight function w is chosen based on an estimate for the density S of the eigenvalue 
distribution of A. It is more convenient to rewrite (3.1) for the unit interval [—1,1] instead of 
[a, 6]. Using the transformation 1(A) defined in (2.10), we obtain the approximation problem 


(3.2) min max \w{t)y „(<)l, £g[-l,l]- 

It remains to give a heuristic for the choice of the weight function w in (3.2). If the estimated 
density S of A happens to be the worst-case density 6 E (cf. (2.14)), then the solution of (3.2) 
should yield the optimal preconditioner for the worst case. As shown in §2, Chebyshev 
polynomials are optimal in this case. Clearly, the weight function should be constructed 
such that in(A) = 1 if 6(t) = l/\/l — f 2 . Therefore, we suggest the choice 

(3.3) w(t) = 


3.2. Estimating the distribution function. Usually, a good estimate for the eigen- 
value distribution <r(A) in (2.8) can be derived from the quantities generated by relatively 
few steps of the Lanczos process [16]. Actually, since the Lanczos algorithm is equivalent to 
CG, we can extract all necessary information from a few steps of the CG algorithm applied 
to the original system (1.1). 

Suppose we have run CG for n steps. Then it has generated the entries of the n x n 
tridiagonal Lanczos matrix 


(3.4) 


Q 1 02 0 

02 a 2 03 

0 /? 3 

0 ••• 0 


•. 0 • 
0n 

Pn a n - 
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The eigenvalues of T n (the so-called Ritz values) are all distinct, and we assume that they 
are numbered in increasing order: 


e, < e 2 < • • • < e n . 

Let s, be a set of corresponding orthonormal eigenvectors, i.e., 

T n s, = 0 iSi , |h|| = 1, 

Moreover, we assume that the s^s are normalized such that their first components, := (s^)j, 
are all real and nonnegative. 

It is well known that the CG residual polynomials <j> 0 , . . ,<j) n (cf. (2.2)) are orthog- 

onal with respect to the discrete inner product induced by T n . More precisely, we have 

n = ° for all j,k = 0, l,...,n, j ± k, 

= ='■ 

The distribution function t(A) in (3.6) is defined by 

r(A) = •£ t?H(\ - #<), 

•=1 

where H is given in (2.8). It can be shown that (•,•)„ and the inner product (•,•) in (2.7) 
have the same (modified) moments up to degree n — 1, i.e., for all real polynomials <f> of 
degree at most n — 1, it holds that 

(3.7) (l,*) n = (l,*>. 

We can then apply a result by Karlin and Shapley [15, Theorem 22.1], which, roughly 
speaking, states that the step function t(A) has to be close to <x(A). More precisely, their 
theorem states that the condition (3.7) implies that <r(A) — t(A) has at least n — 1 sign 
changes in [A min (A), A max (/1)]. Here, A min (A) and X mAX (A) denote the smallest and largest 
eigenvalues of A, respectively. 

Therefore, we use a monotone CMnterpolant of the step function t(A) as our estimation 
for the eigenvalue distribution a of A. For simplicity, we again transform the interval [a, b] 
to the unit interval [—1,1], using the linear map ^(A) defined in (2.10). We note that we 
only require that the lower bound a is nonnegative. In particular, the trivial choice a = 0 is 
feasible. Since the largest Ritz value 0 n is usually a good approximation for A max (A), we set 
b := 0 n . Finally, we always denote by f(t) the estimate for the eigenvalue distribution <r(A) 
transformed to [—1,1]. 


(3.5) 
where 

(3.6) 
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The interpolating points are chosen as follows. We set 


and 


{ 0 for i = 0, 

i{6 { ) for i = l,2,...,n — 1, 
1 for i = n, 


t?, :=< 


f 0 

r/72 + i>; 

i=i 

1 


for i = 0, 
for i = 1,2, . . 
for i = n. 


n — 1, 


The estimated distribution is then chosen as a monotone function / € C*[— 1, 1] satisfying 


(3.8) 


/(<■•) = •>,. 


i — 0, l,...,n. 


4. Monotone piecewise cubic interpolation. One obvious choice for the interpolat- 
ing function / is a monotone piecewise cubic interpolant. In this section we briefly describe 
how to construct such a function. We follow the derivation of Fritsch and Carlson [9] and 
Fritsch and Butland [8]. 

Let — 1 = t 0 < t l < . . . < t n = 1 and 0 = t> 0 < i), < ... < t? n = 1 be given. Our goal is 
to construct a piecewise cubic function S £ C 1 [— 1, 1] such that 


(4.1) 


S(ti) = #i, * = 0, 1, . . . ,n, 


and S is monotone on [-1,1]. To this end, let h { = t i+1 — t i and A; = (i? t+1 — i — 

0, 1, . . . , n— 1. The trick is to express S, on any subinterval in terms of the derivatives 

< = S'iti), i = 0, 1, . . . , n, (cf. [9]) 


S(t) = 


< h±\ 2A j 

hi 


(t ~ t t ) 3 + 


-2d, - d- +1 + 3A t - 

h , 




By construction, any choice of the free parameters d i leads to a function S 6 C 1 [— 1, 1] that 
fulfills (4.1). The remaining step is to adjust the s such that S is monotone on [—1, 1]. 

In the literature one can find several schemes for computing the (not uniquely deter- 
mined) d t . Here we used a formula proposed by Brodlie [3] and Fritsch and Butland [8]: 


(4.2) 


d ; =l 


A 


+ (i - Q&i-i 

o 


for A._ X A,- > 0, 
otherwise, 


i = 1,2, . . . ,n — 1, 


where £,• = (A,_ x + 2h i )/(3(h i _ 1 + /i,)). In addition to (4.2), we still need to choose the 
boundary conditions d 0 and d n . Since we have no information about the endpoint derivatives 
available, we select a (weak) version of the so-called “not-a-knot” condition (cf. De Boor [5, 
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pp. 54]). Here one chooses d 0 and d n such that S is twice continuously differentiable on 
[f 0 ,f 2 ) and respectively. One obtains 


(4.3) 


7^(3A X — (2<fj + d 2 )) + 3A 0 — 2d,, 

"1 

n-1 (3A n _ 2 — (2d n _ 1 + d n _ 2 )) + 3A n _ x — 2d n _ 1 . 

”n— 2 


However, this special choice of d 0 and d n does not necessarily produce a monotone S on the 
subintervals [t 0 , f x ] an d K»-n*»]> respectively. 

Note that the additional requirements d 0 E [0, 3A 0 ] and d n E [0, 3A n _j] will lead to a 
monotone function. Thus, if, e.g., d 0 (computed by (4.3)) turns out to be negative or bigger 
than 3A 0 , we simply set d 0 = 0 or d 0 = 3A 0 , respectively. 

We would like to mention that FORTRAN versions for the described procedures are 
available in NETLIB (PCHIP package). 

Finally, we set 6(t) = S'(t), where S is the computed monotone interpolant, and then 
define the weight function w(t ) = w(t] S ) by (3.3). The desired polynomial preconditioner 
is then obtained by (numerically) solving the resulting weighted Chebyshev approximation 
problem (3.2). 


5. Bernstein-Szego weight functions. In the preceding section we first approxi- 
mated the distribution function and then solved the resulting approximation problem (3.2) 
numerically. Here we follow a different approach which will eliminate the latter approxima- 
tion process. The idea is to restrict the weight functions to a class, for which the solution 
of the Chebyshev problem (3.2) is explicitly known. We consider three classes of weight 
function that fulfill this requirement. 

Let p k be a real polynomial of degree k with p k (t) > 0 on [—1,1], and define 


(5.1) s 0 (t) = 1, s 1/2 (t) = y/lTi, s,(f) = %/l-< 2 . 


Then (3.2) can be solved explicitly for the so-called Bernstein-Szego weight functions 


(5.2) 



ie (o, 1/2,1}; 


see Szego [24], Freund [6], and the references given therein. More precisely, the solution <p p 
of (3.2) with respect to Wj, j G {0, 1/2, 1}, is explicitly known for p>Pj, where 


(0 if j = 1 and k = 0, 

Pj ~ \ L(^ + l)/2 — j\ otherwise. 


For convenience, in the sequel, we allow p k to have (simple) zeros at the endpoints 
±1, i.e., the cases j = 0, 1/2 are now included in the case j = 1. Therefore, we will only 
investigate weight functions of the form 


w(t) = tO^f) = 


\l7k (<) 


(5.3) 
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In view of (3.3), we have to adjust p k such that 

r \ ./i _ t 2 

< 5 - 4 > /(A) ■ /., ~^r dt 

fulfills the interpolatory conditions f(tj) = 0j (cf. (3.8)). Note that /, defined by (5.4), is 
“automatically” monotone on [—1, 1], and that /( — 1) = 0 for every p k . 

It turns out to be advantageous to express p k in terms of its zeros 


m l 

(5.5) p k {t) = a m +i II(* — a j) II — z j)(^ — k = m + 21. 

3 = 1 3 = 1 

Here, a j = 1,2,..., m, are the real zeros and Zj = Xj + ij/j, Vj > 0, j = 1, 2, ...,/, are the 
complex zeros in the upper half plane. The partial fractions expansion of (5.4) reads 


(5.6) 


where 


flx) = j_(y a [ x ^Hj-dt+Y r w ± 

= F{\\ Oj, . . . , Q m +ii • • • > Vii • • • i ih)i 


dt 


A-i = -77 — r, i = 1,2, ... ,m, 
J Pk( a j ) 


+ c,. ^ 


_ Pfc(*j)(* - z j) + p'kiZjW - *j) 


Pk( z j)p'k( z j) 


j = 1,2,...,/. 


It is readily verified that all integrals in (5.6) have an explicit antiderivative. We omit these 
routine, but somewhat lengthy, calculations. 

Now the interpolatory conditions /(<,•) = (cf. (3.8)) and the positivity constraints 
p k (t) > 0, t € (—1, 1) lead to the following nonlinear constraint interpolation problem: 

find real numbers a x , . . . , a m+1 , x x , . . . , x t , y x , . . . , y t subject to 

<*|, . . • , • • • , ®j, yi ,..•,!//) J 1, 2, . . . , n, 

(5.7) l a j| ^ lj j = 2, . . • , 37i, 

m 

sgn(a m+x ) = (-l) m II s S n ( a j)’ 

j=i 

J/j > 0, j — 1,2,...,/. 


Apart from the sign conditions, the problem (5.7) can be viewed as a nonlinear system of n 
equations infc + l = m-f-l + 2/ unknowns. Therefore, a natural choice of the polynomial 
degree is k = n — 1. Also, we would like to mention that (5.7) does not address the problem 
of (possible) multiple zeros at ±1. However, this problem never occurred in our experiments. 
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The success of any nonlinear solver applied to (5.7) depends strongly on good starting 
values. We will now describe a (linear) method for obtaining such values. In view of (5.4) 
we have 


ftwm = vt - a 2 , 


and consequently 

(5.8) (Pk( X )f( X ))' = PkW'fW + V ^ 1 “ A2- 

Hence, by integrating (4.8) and using /(— 1) = 0, we obtain the identity 

(5.9) p fe (A)/(A) = £ p k {t)'f(i)dt + 5(A), 


where 5 (A) = S-i y/T^Pdt. Setting 


(5.10) 


53^jA J , bj £ R, 
3 - 0 
A 


7j(A ;/) 3 jTV'/tO* 


in equation (5.9) gives 

(5.11) £ (AV(A) - i7,-(A; /)) = 0(A)- 

j=o 

Then we end up with a semiinfinite linear problem: 
find real numbers j = 0,1,..., fc, subject to 

G(*,A,---A) = 0, * = 1,2, ... ,n, 

(512) £6/ > 0, * e (-1,1), 

j=0 

where 

G(t,; V • • • A) = E b j {Wi ~ /)) “ $(*.-)• 

j=0 

The price paid for the linearity of the problem are the infinitely many constraints In addi- 
tion, we have to evaluate the integrals 7j which involve the unknown function / (cf. (5.10)). 
To overcome this problem, we approximate / by a monotone piecewise cubic function 5, as 
described in §4. Clearly, the resulting integrals 7 j (A; S) are easy to compute. 

A straightforward approach for attacking semiinfinite problems is to replace the infinitely 
many constraints by a finite subset. Finally, after relaxing the interpolatory conditions 
slightly, we obtain the following linear programming problem: 
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find real numbers b - , j = 0, 1, . . . , k, subject to 

'E l G(t i -,b 0 ,...,b k ) = min!, 

1=1 

(5.13) G(ti’,b 0 ,...,b k ) > 0 t = l,2,...,«, 

£ b-i{ > 0, £, €(-1,1), t = 1,2, ... ,M. 

j=o 

It is not hard to check that the feasible set of (5.13) is not empty, and hence (5.13) always 
has a solution pj(f ; A/) = ^2j= o bjt* . Here the second parameter M indicates that p\ depends 
on the number of positivity constraints. Unfortunately, it is possible that p k (t] M) has zeros 
in (—1, 1). Here, one basically has to distinguish between the cases that />£(<; M) has one zero 
H in the first (last) interval (-1, £,) ((( M , 1)) or two zeros n x ,n 2 in (<,-,t, +1 ), i € {0, 1, . . . , A/}, 
where £ 0 := — 1 and t M+1 := 1. Such zeros can not be used as starting values for the nonlinear 
problem (5.7). Therefore, we replace /z by —1 or 1 and the real zeros /i x , p 2 by the complex 
zeros z = (fi t + H 2 )/2 + *£, 2, where e is some small positive number. Notice that 

(t - ^)(t - H ) - (t - z)(t -z) = ^(/z 1 - fi 2 ) 2 + e 2 < | £, - £ f+1 | 2 + £ 2 . 

For sufficiently large M and sufficiently small e, this “zero-substitution” only slightly per- 
turbs p*(t; M). 

In our experiments, we always chose the < t ’s as Chebyshev knots with M = 200. The 
corresponding linear solution p* k (t ; M) always served as a good starting guess for the nonlinear 
solver applied to (5.7). 

6. Concluding remarks. On modern architectures, it is attractive to combine the con- 
jugate gradient algorithm with polynomial preconditioning. In this paper, we have presented 
an approach for adapting polynomial preconditioners to the actual eigenvalue distribution 
of the coefficient matrix of the linear system. Our technique is based on the observation 
that good estimates for the eigenvalue distribution can be derived after only a few steps of 
the Lanczos process. We then use this information to construct a weight function for a suit- 
able Chebyshev approximation problem. The solution of this problem yields the polynomial 
preconditioner. We have explored the use of Bernstein-Szego weights. 

This manuscript is still a preliminary and incomplete version. In the final paper, we will 
also include numerical results. 

Acknowledgement. We would like to thank Youcef Saad for bringing reference [17] to 
our attention. 
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